{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Stock NeurIPS2018 Part 1. Data\n",
        "This series is a reproduction of paper *the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading*. \n",
        "\n",
        "This is the first part of the NeurIPS2018 series, introducing how to use FinRL to fetch and process data that we need for ML/RL trading.\n",
        "\n",
        "Other demos can be found at the repo of [FinRL-Tutorials]((https://github.com/AI4Finance-Foundation/FinRL-Tutorials))."
      ],
      "metadata": {
        "id": "Sy8r7_g5WjAT"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Part 1. Install Packages"
      ],
      "metadata": {
        "id": "2uH1KXctgnoJ"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "a2oD49e_N_05"
      },
      "outputs": [],
      "source": [
        "## install required packages\n",
        "!pip install swig\n",
        "!pip install wrds\n",
        "!pip install pyportfolioopt\n",
        "## install finrl library\n",
        "!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "import numpy as np\n",
        "import datetime\n",
        "import yfinance as yf\n",
        "\n",
        "from finrl.meta.preprocessor.yahoodownloader import YahooDownloader\n",
        "from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split\n",
        "from finrl import config_tickers\n",
        "from finrl.config import INDICATORS\n",
        "\n",
        "import itertools"
      ],
      "metadata": {
        "id": "j37flV31OJGW"
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Part 2. Fetch data"
      ],
      "metadata": {
        "id": "wxsN8i7tg07U"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "[yfinance](https://github.com/ranaroussi/yfinance) is an open-source library that provides APIs fetching historical data form Yahoo Finance. In FinRL, we have a class called [YahooDownloader](https://github.com/AI4Finance-Foundation/FinRL/blob/master/finrl/meta/preprocessor/yahoodownloader.py) that use yfinance to fetch data from Yahoo Finance."
      ],
      "metadata": {
        "id": "fMNm9tCMXy8J"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**OHLCV**: Data downloaded are in the form of OHLCV, corresponding to **open, high, low, close, volume,** respectively. OHLCV is important because they contain most of numerical information of a stock in time series. From OHLCV, traders can get further judgement and prediction like the momentum, people's interest, market trends, etc."
      ],
      "metadata": {
        "id": "CWVXUkzaZE8m"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data for a single ticker"
      ],
      "metadata": {
        "id": "jRYlbdMpW9Np"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Here we provide two ways to fetch data with single ticker, let's take Apple Inc. (AAPL) as an example."
      ],
      "metadata": {
        "id": "1wo6pCQYXDbz"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Using yfinance"
      ],
      "metadata": {
        "id": "yzVRe90WXLB1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "aapl_df_yf = yf.download(tickers = \"aapl\", start='2020-01-01', end='2020-01-31')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "SSl6mVV7XNw6",
        "outputId": "460c06eb-c71d-4ebb-fe17-481295d70cff"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\r[*********************100%***********************]  1 of 1 completed\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "aapl_df_yf.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 238
        },
        "id": "Rjutz22rXrpR",
        "outputId": "62aadc8c-b854-403d-ac73-6cf86d53fa22"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                 Open       High        Low      Close  Adj Close     Volume\n",
              "Date                                                                        \n",
              "2020-01-02  74.059998  75.150002  73.797501  75.087502  73.449394  135480400\n",
              "2020-01-03  74.287498  75.144997  74.125000  74.357498  72.735321  146322800\n",
              "2020-01-06  73.447502  74.989998  73.187500  74.949997  73.314873  118387200\n",
              "2020-01-07  74.959999  75.224998  74.370003  74.597504  72.970093  108872000\n",
              "2020-01-08  74.290001  76.110001  74.290001  75.797501  74.143898  132079200"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-4ade345d-daed-43d0-85d1-f352cbe7fcd3\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Open</th>\n",
              "      <th>High</th>\n",
              "      <th>Low</th>\n",
              "      <th>Close</th>\n",
              "      <th>Adj Close</th>\n",
              "      <th>Volume</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Date</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>2020-01-02</th>\n",
              "      <td>74.059998</td>\n",
              "      <td>75.150002</td>\n",
              "      <td>73.797501</td>\n",
              "      <td>75.087502</td>\n",
              "      <td>73.449394</td>\n",
              "      <td>135480400</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2020-01-03</th>\n",
              "      <td>74.287498</td>\n",
              "      <td>75.144997</td>\n",
              "      <td>74.125000</td>\n",
              "      <td>74.357498</td>\n",
              "      <td>72.735321</td>\n",
              "      <td>146322800</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2020-01-06</th>\n",
              "      <td>73.447502</td>\n",
              "      <td>74.989998</td>\n",
              "      <td>73.187500</td>\n",
              "      <td>74.949997</td>\n",
              "      <td>73.314873</td>\n",
              "      <td>118387200</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2020-01-07</th>\n",
              "      <td>74.959999</td>\n",
              "      <td>75.224998</td>\n",
              "      <td>74.370003</td>\n",
              "      <td>74.597504</td>\n",
              "      <td>72.970093</td>\n",
              "      <td>108872000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2020-01-08</th>\n",
              "      <td>74.290001</td>\n",
              "      <td>76.110001</td>\n",
              "      <td>74.290001</td>\n",
              "      <td>75.797501</td>\n",
              "      <td>74.143898</td>\n",
              "      <td>132079200</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-4ade345d-daed-43d0-85d1-f352cbe7fcd3')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-4ade345d-daed-43d0-85d1-f352cbe7fcd3 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-4ade345d-daed-43d0-85d1-f352cbe7fcd3');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Using FinRL"
      ],
      "metadata": {
        "id": "fHZLDmnsXOK0"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In FinRL's YahooDownloader, we modified the data frame to the form that convenient for further data processing process. We use adjusted close price instead of close price, and add a column representing the day of a week (0-4 corresponding to Monday-Friday)."
      ],
      "metadata": {
        "id": "VFB77ohNbXCc"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "aapl_df_finrl = YahooDownloader(start_date = '2020-01-01',\n",
        "                                end_date = '2020-01-31',\n",
        "                                ticker_list = ['aapl']).fetch_data()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7ufDHvt4XBWT",
        "outputId": "41603042-4f14-4814-c569-305d85fa7f9d"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\r[*********************100%***********************]  1 of 1 completed\n",
            "Shape of DataFrame:  (20, 8)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "aapl_df_finrl.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "_TgEjXxhXtT_",
        "outputId": "a8e8a9e2-a1ea-472e-eddf-2227e6c901d8"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "         date       open       high        low      close     volume   tic  \\\n",
              "0  2020-01-02  74.059998  75.150002  73.797501  73.449394  135480400  aapl   \n",
              "1  2020-01-03  74.287498  75.144997  74.125000  72.735313  146322800  aapl   \n",
              "2  2020-01-06  73.447502  74.989998  73.187500  73.314888  118387200  aapl   \n",
              "3  2020-01-07  74.959999  75.224998  74.370003  72.970078  108872000  aapl   \n",
              "4  2020-01-08  74.290001  76.110001  74.290001  74.143906  132079200  aapl   \n",
              "\n",
              "   day  \n",
              "0    3  \n",
              "1    4  \n",
              "2    0  \n",
              "3    1  \n",
              "4    2  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-99616e29-275e-48e3-81dd-9dcbcbc33fe9\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>date</th>\n",
              "      <th>open</th>\n",
              "      <th>high</th>\n",
              "      <th>low</th>\n",
              "      <th>close</th>\n",
              "      <th>volume</th>\n",
              "      <th>tic</th>\n",
              "      <th>day</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>2020-01-02</td>\n",
              "      <td>74.059998</td>\n",
              "      <td>75.150002</td>\n",
              "      <td>73.797501</td>\n",
              "      <td>73.449394</td>\n",
              "      <td>135480400</td>\n",
              "      <td>aapl</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2020-01-03</td>\n",
              "      <td>74.287498</td>\n",
              "      <td>75.144997</td>\n",
              "      <td>74.125000</td>\n",
              "      <td>72.735313</td>\n",
              "      <td>146322800</td>\n",
              "      <td>aapl</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2020-01-06</td>\n",
              "      <td>73.447502</td>\n",
              "      <td>74.989998</td>\n",
              "      <td>73.187500</td>\n",
              "      <td>73.314888</td>\n",
              "      <td>118387200</td>\n",
              "      <td>aapl</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>2020-01-07</td>\n",
              "      <td>74.959999</td>\n",
              "      <td>75.224998</td>\n",
              "      <td>74.370003</td>\n",
              "      <td>72.970078</td>\n",
              "      <td>108872000</td>\n",
              "      <td>aapl</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>2020-01-08</td>\n",
              "      <td>74.290001</td>\n",
              "      <td>76.110001</td>\n",
              "      <td>74.290001</td>\n",
              "      <td>74.143906</td>\n",
              "      <td>132079200</td>\n",
              "      <td>aapl</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-99616e29-275e-48e3-81dd-9dcbcbc33fe9')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-99616e29-275e-48e3-81dd-9dcbcbc33fe9 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-99616e29-275e-48e3-81dd-9dcbcbc33fe9');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 7
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Data for the chosen tickers"
      ],
      "metadata": {
        "id": "9kcOE5nbic6R"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "config_tickers.DOW_30_TICKER"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FKBjtAo2uIq5",
        "outputId": "927f682a-9cc3-4c11-c3f1-094ae811af6b"
      },
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['AXP',\n",
              " 'AMGN',\n",
              " 'AAPL',\n",
              " 'BA',\n",
              " 'CAT',\n",
              " 'CSCO',\n",
              " 'CVX',\n",
              " 'GS',\n",
              " 'HD',\n",
              " 'HON',\n",
              " 'IBM',\n",
              " 'INTC',\n",
              " 'JNJ',\n",
              " 'KO',\n",
              " 'JPM',\n",
              " 'MCD',\n",
              " 'MMM',\n",
              " 'MRK',\n",
              " 'MSFT',\n",
              " 'NKE',\n",
              " 'PG',\n",
              " 'TRV',\n",
              " 'UNH',\n",
              " 'CRM',\n",
              " 'VZ',\n",
              " 'V',\n",
              " 'WBA',\n",
              " 'WMT',\n",
              " 'DIS',\n",
              " 'DOW']"
            ]
          },
          "metadata": {},
          "execution_count": 8
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "TRAIN_START_DATE = '2009-01-01'\n",
        "TRAIN_END_DATE = '2020-07-01'\n",
        "TRADE_START_DATE = '2020-07-01'\n",
        "TRADE_END_DATE = '2021-10-29'"
      ],
      "metadata": {
        "id": "9xTPG4Fhc-zL"
      },
      "execution_count": 9,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "df_raw = YahooDownloader(start_date = TRAIN_START_DATE,\n",
        "                     end_date = TRADE_END_DATE,\n",
        "                     ticker_list = config_tickers.DOW_30_TICKER).fetch_data()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9LblMI8CO0F3",
        "outputId": "7be76385-50eb-4e8d-f2e5-1795d77b70ba"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "Shape of DataFrame:  (94301, 8)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "df_raw.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "RD9cjHzt8X3A",
        "outputId": "051acda5-c8fd-440a-a5af-6be04cfdc018"
      },
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "         date       open       high        low      close     volume   tic  \\\n",
              "0  2009-01-02   3.067143   3.251429   3.041429   2.758536  746015200  AAPL   \n",
              "1  2009-01-02  58.590000  59.080002  57.750000  43.832630    6547900  AMGN   \n",
              "2  2009-01-02  18.570000  19.520000  18.400000  15.365304   10955700   AXP   \n",
              "3  2009-01-02  42.799999  45.560001  42.779999  33.941090    7010200    BA   \n",
              "4  2009-01-02  44.910000  46.980000  44.709999  31.579340    7117200   CAT   \n",
              "\n",
              "   day  \n",
              "0    4  \n",
              "1    4  \n",
              "2    4  \n",
              "3    4  \n",
              "4    4  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-07863913-3163-494f-b010-d2dae51bdfbf\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>date</th>\n",
              "      <th>open</th>\n",
              "      <th>high</th>\n",
              "      <th>low</th>\n",
              "      <th>close</th>\n",
              "      <th>volume</th>\n",
              "      <th>tic</th>\n",
              "      <th>day</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>3.067143</td>\n",
              "      <td>3.251429</td>\n",
              "      <td>3.041429</td>\n",
              "      <td>2.758536</td>\n",
              "      <td>746015200</td>\n",
              "      <td>AAPL</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>58.590000</td>\n",
              "      <td>59.080002</td>\n",
              "      <td>57.750000</td>\n",
              "      <td>43.832630</td>\n",
              "      <td>6547900</td>\n",
              "      <td>AMGN</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>18.570000</td>\n",
              "      <td>19.520000</td>\n",
              "      <td>18.400000</td>\n",
              "      <td>15.365304</td>\n",
              "      <td>10955700</td>\n",
              "      <td>AXP</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>42.799999</td>\n",
              "      <td>45.560001</td>\n",
              "      <td>42.779999</td>\n",
              "      <td>33.941090</td>\n",
              "      <td>7010200</td>\n",
              "      <td>BA</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>44.910000</td>\n",
              "      <td>46.980000</td>\n",
              "      <td>44.709999</td>\n",
              "      <td>31.579340</td>\n",
              "      <td>7117200</td>\n",
              "      <td>CAT</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-07863913-3163-494f-b010-d2dae51bdfbf')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-07863913-3163-494f-b010-d2dae51bdfbf button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-07863913-3163-494f-b010-d2dae51bdfbf');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uqC6c40Zh1iH"
      },
      "source": [
        "# Part 3: Preprocess Data\n",
        "We need to check for missing data and do feature engineering to convert the data point into a state.\n",
        "* **Adding technical indicators**. In practical trading, various information needs to be taken into account, such as historical prices, current holding shares, technical indicators, etc. Here, we demonstrate two trend-following technical indicators: MACD and RSI.\n",
        "* **Adding turbulence index**. Risk-aversion reflects whether an investor prefers to protect the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the turbulence index that measures extreme fluctuation of asset price."
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Hear let's take **MACD** as an example. Moving average convergence/divergence (MACD) is one of the most commonly used indicator showing bull and bear market. Its calculation is based on EMA (Exponential Moving Average indicator, measuring trend direction over a period of time.)"
      ],
      "metadata": {
        "id": "1lQxLyWpdbAd"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "id": "PmKP-1ii3RLS",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d514cf1a-8609-402e-ad58-df5f9100ec85"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Successfully added technical indicators\n",
            "[*********************100%***********************]  1 of 1 completed\n",
            "Shape of DataFrame:  (3228, 8)\n",
            "Successfully added vix\n",
            "Successfully added turbulence index\n"
          ]
        }
      ],
      "source": [
        "fe = FeatureEngineer(use_technical_indicator=True,\n",
        "                     tech_indicator_list = INDICATORS,\n",
        "                     use_vix=True,\n",
        "                     use_turbulence=True,\n",
        "                     user_defined_feature = False)\n",
        "\n",
        "processed = fe.preprocess_data(df_raw)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "id": "Kixon2tR3RLT"
      },
      "outputs": [],
      "source": [
        "list_ticker = processed[\"tic\"].unique().tolist()\n",
        "list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))\n",
        "combination = list(itertools.product(list_date,list_ticker))\n",
        "\n",
        "processed_full = pd.DataFrame(combination,columns=[\"date\",\"tic\"]).merge(processed,on=[\"date\",\"tic\"],how=\"left\")\n",
        "processed_full = processed_full[processed_full['date'].isin(processed['date'])]\n",
        "processed_full = processed_full.sort_values(['date','tic'])\n",
        "\n",
        "processed_full = processed_full.fillna(0)"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "processed_full.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 357
        },
        "id": "HwKJNWJSabNK",
        "outputId": "16c8080e-91b0-4e8d-9a09-44939ac69801"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "         date   tic       open       high        low      close       volume  \\\n",
              "0  2009-01-02  AAPL   3.067143   3.251429   3.041429   2.758536  746015200.0   \n",
              "1  2009-01-02  AMGN  58.590000  59.080002  57.750000  43.832630    6547900.0   \n",
              "2  2009-01-02   AXP  18.570000  19.520000  18.400000  15.365304   10955700.0   \n",
              "3  2009-01-02    BA  42.799999  45.560001  42.779999  33.941090    7010200.0   \n",
              "4  2009-01-02   CAT  44.910000  46.980000  44.709999  31.579340    7117200.0   \n",
              "\n",
              "   day  macd   boll_ub   boll_lb  rsi_30     cci_30  dx_30  close_30_sma  \\\n",
              "0  4.0   0.0  2.981391  2.652102   100.0  66.666667  100.0      2.758536   \n",
              "1  4.0   0.0  2.981391  2.652102   100.0  66.666667  100.0     43.832630   \n",
              "2  4.0   0.0  2.981391  2.652102   100.0  66.666667  100.0     15.365304   \n",
              "3  4.0   0.0  2.981391  2.652102   100.0  66.666667  100.0     33.941090   \n",
              "4  4.0   0.0  2.981391  2.652102   100.0  66.666667  100.0     31.579340   \n",
              "\n",
              "   close_60_sma        vix  turbulence  \n",
              "0      2.758536  39.189999         0.0  \n",
              "1     43.832630  39.189999         0.0  \n",
              "2     15.365304  39.189999         0.0  \n",
              "3     33.941090  39.189999         0.0  \n",
              "4     31.579340  39.189999         0.0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-adf71481-5e19-4f7e-8da8-a407d0e71404\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>date</th>\n",
              "      <th>tic</th>\n",
              "      <th>open</th>\n",
              "      <th>high</th>\n",
              "      <th>low</th>\n",
              "      <th>close</th>\n",
              "      <th>volume</th>\n",
              "      <th>day</th>\n",
              "      <th>macd</th>\n",
              "      <th>boll_ub</th>\n",
              "      <th>boll_lb</th>\n",
              "      <th>rsi_30</th>\n",
              "      <th>cci_30</th>\n",
              "      <th>dx_30</th>\n",
              "      <th>close_30_sma</th>\n",
              "      <th>close_60_sma</th>\n",
              "      <th>vix</th>\n",
              "      <th>turbulence</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>AAPL</td>\n",
              "      <td>3.067143</td>\n",
              "      <td>3.251429</td>\n",
              "      <td>3.041429</td>\n",
              "      <td>2.758536</td>\n",
              "      <td>746015200.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.981391</td>\n",
              "      <td>2.652102</td>\n",
              "      <td>100.0</td>\n",
              "      <td>66.666667</td>\n",
              "      <td>100.0</td>\n",
              "      <td>2.758536</td>\n",
              "      <td>2.758536</td>\n",
              "      <td>39.189999</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>AMGN</td>\n",
              "      <td>58.590000</td>\n",
              "      <td>59.080002</td>\n",
              "      <td>57.750000</td>\n",
              "      <td>43.832630</td>\n",
              "      <td>6547900.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.981391</td>\n",
              "      <td>2.652102</td>\n",
              "      <td>100.0</td>\n",
              "      <td>66.666667</td>\n",
              "      <td>100.0</td>\n",
              "      <td>43.832630</td>\n",
              "      <td>43.832630</td>\n",
              "      <td>39.189999</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>AXP</td>\n",
              "      <td>18.570000</td>\n",
              "      <td>19.520000</td>\n",
              "      <td>18.400000</td>\n",
              "      <td>15.365304</td>\n",
              "      <td>10955700.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.981391</td>\n",
              "      <td>2.652102</td>\n",
              "      <td>100.0</td>\n",
              "      <td>66.666667</td>\n",
              "      <td>100.0</td>\n",
              "      <td>15.365304</td>\n",
              "      <td>15.365304</td>\n",
              "      <td>39.189999</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>BA</td>\n",
              "      <td>42.799999</td>\n",
              "      <td>45.560001</td>\n",
              "      <td>42.779999</td>\n",
              "      <td>33.941090</td>\n",
              "      <td>7010200.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.981391</td>\n",
              "      <td>2.652102</td>\n",
              "      <td>100.0</td>\n",
              "      <td>66.666667</td>\n",
              "      <td>100.0</td>\n",
              "      <td>33.941090</td>\n",
              "      <td>33.941090</td>\n",
              "      <td>39.189999</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>2009-01-02</td>\n",
              "      <td>CAT</td>\n",
              "      <td>44.910000</td>\n",
              "      <td>46.980000</td>\n",
              "      <td>44.709999</td>\n",
              "      <td>31.579340</td>\n",
              "      <td>7117200.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.981391</td>\n",
              "      <td>2.652102</td>\n",
              "      <td>100.0</td>\n",
              "      <td>66.666667</td>\n",
              "      <td>100.0</td>\n",
              "      <td>31.579340</td>\n",
              "      <td>31.579340</td>\n",
              "      <td>39.189999</td>\n",
              "      <td>0.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-adf71481-5e19-4f7e-8da8-a407d0e71404')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-adf71481-5e19-4f7e-8da8-a407d0e71404 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-adf71481-5e19-4f7e-8da8-a407d0e71404');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 16
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Part 4: Save the Data"
      ],
      "metadata": {
        "id": "ydLNxwdPIJhW"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Split the data for training and trading"
      ],
      "metadata": {
        "id": "VbMDnfukILc_"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "train = data_split(processed_full, TRAIN_START_DATE,TRAIN_END_DATE)\n",
        "trade = data_split(processed_full, TRADE_START_DATE,TRADE_END_DATE)\n",
        "print(len(train))\n",
        "print(len(trade))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iEiXDdUhZP7R",
        "outputId": "554b1c09-6d6f-48fb-c724-351b40a2ddaf"
      },
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "83897\n",
            "9715\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Save data to csv file"
      ],
      "metadata": {
        "id": "DflbzEV8IRhF"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "For Colab users, you can open the virtual directory in colab and manually download the files.\n",
        "\n",
        "For users running on your local environment, the csv files should be at the same directory of this notebook."
      ],
      "metadata": {
        "id": "Tud3IZDzIUpd"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "train.to_csv('train_data.csv')\n",
        "trade.to_csv('trade_data.csv')"
      ],
      "metadata": {
        "id": "j2c12CpfHEjE"
      },
      "execution_count": 18,
      "outputs": []
    }
  ]
}
